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PREPARATION PHASE 



24 

J 



USER DEFINES THE FOLLOWING: 



WEB PAGE CONTENT TYPES 
THAT THE METHOD MUST 
RECOGNIZE 



10 



N (COMPANY NEWS) 

C (CONTACT INFORMATION) 

P (PRODUCT INFORMATION) 

M (MANAGEMENT TEAM) 

D (COMPANY DESCRIPTION) 

...etc... 



SET OF TESTS THAT PROVIDE 
EVIDENCE ABOUT THE 
CONTENT TYPE 

/15 



T1 = "NUMBER OF EXTERNAL 

LINKS ON PAGE > 5" 

T2 = "NUMBER OF INTERNAL 

LINKS>10" 

T3 = "LINK TEXT CONTAINS 
CONTACT KEYWORDS 
(e.g. ADDRESS.LOCATION, 
CONTACT, etc)" 
T4 = "NUMBER OF PEOPLE 
NAMES IN PAGE > 3" 
T5 = "PAGE CONTAINS 
STOCK TICKER SYMBOL" 
T6 = "PAGE CONTINES 
HEADER STARTING 
WITH WORD "ABOUT.."" 
...etc... 



FIG. 1 
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TRAINING SET OF WEB 
PAGES WITH KNOWN 
CONTENTS 
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TRAINING PHASE 



50 

J 



^ 



20 

CONTENT TYPES FOR 
EACH WEB PAGE IN 
THE TRAINING SET 




22 

TEST RESULTS FOR EACH 
WEB PAGE IN THE 
TRAINING SET 



CALCULATE 
STATISTICS 



PAGE 


T1 


T2 


T3 


T4 ^ 
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P(H=N) = 0.20 




P(H=C) = 0.20 




etc 


P(T1=T/H=N) = 
P(T1=F/H=N) = 


0.4630 
0.5370 


P(T1=T/H=C) = 
P(T1=T/H=C) = 


0.2344 
0.7656 




P(T2=T/H=N) = 
P(T2=F/H=N) = 


0.2647 
0.7353 


P(T2=T/H=C) = 
P(T2=T/H=C) = 


0.6224 
0.3776 


etc 


P(T3=T/H=N) = 
P{T3=F/H=N) = 


0.7352 
0.2648 


P(T3=T/H=C) = 
P(T3=T/H=C) = 


0.2432 
0.7568 




etc 




....etc 
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CLASSIFICATION PHASE 
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52 

J 



SUBJECT WEB PAGE 
(UNKNOWN CONTENT 
TYPE) 



STATISTICS FROM TRAINING PHASE 



TEST RESULTS FOR 
SUBJECT SITE 
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P(H=N)=0.20 




P(H=C)=0.20 


...etc... 


P(T1=T/H=N) 
P(T1=F/H=N) 


= 0.4630 
= 0.5370 


P(T1=T/H=C) 
P(T1=T/H=C) 


= 0.2344 
= 0.7656 


P(T2=T/H=N) 
P(T2=F/H=N) 


= 0.2647 
= 0.7353 


P(T2=T/H=C) 
P(T2=T/H=C) 


= 0.6224 ...etc... 
= 0.3776 


P(T3=T/H=N) 
P(T3=F/H=N) 


= 0.7352 
= 0.2648 


P(T3=T/H=C) 
P(T3=T/H=C) 


= 0.2432 
= 0.7568 


etc 




etc 
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CONFIDENCE LEVELS FOR EACH 
CONTENT TYPE 



/-32 



BAYESIAN NETWORK ^ 



(COMBINE TEST REULTS AND 

CALCULATE CONFIDENCE 
LEVEL FOR EACH CANDIDATE 
TYPE) 





CONTENT 


CONF. 




TYPE 


LEVEL 


N 


(COMPANY NEWS) 


22% 


C 


(CONTACT INFORMATION) 


4% 


P 


(PRODUCT INFORMATION) 


89% 


M 


(MANAGEMENT TEAM) 


7% 


D 


(COMPANY DESCRIPTION) 


92% 


...etc... 
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PREFERRED EMBODIMENT 



f INPUT ^ 



TEST 




TRAINING 


MODULE 




MODULE 


54 




50 



BAYESIAN 
NETWORK 
MODULE 



52 



59 



(^OUTPUT^NF^^ 
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